Goto

Collaborating Authors

 local smoothness


Theoretical Analysis on how Learning Rate Warmup Accelerates Convergence

Liu, Yuxing, Ge, Yuze, Pan, Rui, Kang, An, Zhang, Tong

arXiv.org Artificial Intelligence

Learning rate warmup is a popular and practical technique in training large-scale deep neural networks. Despite the huge success in practice, the theoretical advantages of this strategy of gradually increasing the learning rate at the beginning of the training process have not been fully understood. To resolve this gap between theory and practice, we first propose a novel family of generalized smoothness assumptions, and validate its applicability both theoretically and empirically. Under the novel smoothness assumption, we study the convergence properties of gradient descent (GD) in both deterministic and stochastic settings. It is shown that learning rate warmup consistently accelerates GD, and GD with warmup can converge at most $Θ(T)$ times faster than with a non-increasing learning rate schedule in some specific cases, providing insights into the benefits of this strategy from an optimization theory perspective.


Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

Submitted by Assigned_Reviewer_1 Q1 The authors propose a non-uniform sampling scheme for variance reduced SGD type algorithms based on local smoothness and the fact that the gradient of many individual losses is constant. The authors show that such a scheme is able to outperform uniform sampling for SVRG and SDCA. Overall the idea is an interesting one and seems to perform well in practice. However, I feel that the paper has some major clarity issues. In general, I find the paper quite difficult to read.


Anderson Acceleration in Nonsmooth Problems: Local Convergence via Active Manifold Identification

Li, Kexin, Bai, Luwei, Wang, Xiao, Wang, Hao

arXiv.org Artificial Intelligence

Anderson acceleration is an effective technique for enhancing the efficiency of fixed-point iterations; however, analyzing its convergence in nonsmooth settings presents significant challenges. In this paper, we investigate a class of nonsmooth optimization algorithms characterized by the active manifold identification property. This class includes a diverse array of methods such as the proximal point method, proximal gradient method, proximal linear method, proximal coordinate descent method, Douglas-Rachford splitting (or the alternating direction method of multipliers), and the iteratively reweighted $\ell_1$ method, among others. Under the assumption that the optimization problem possesses an active manifold at a stationary point, we establish a local R-linear convergence rate for the Anderson-accelerated algorithm. Our extensive numerical experiments further highlight the robust performance of the proposed Anderson-accelerated methods.


Local Smoothness in Variance Reduced Optimization

Neural Information Processing Systems

We propose a family of non-uniform sampling strategies to provably speed up a class of stochastic optimization algorithms with linear convergence including Stochastic Variance Reduced Gradient (SVRG) and Stochastic Dual Coordinate Ascent (SDCA). For a large family of penalized empirical risk minimization problems, our methods exploit data dependent local smoothness of the loss functions near the optimum, while maintaining convergence guarantees. Our bounds are the first to quantify the advantage gained from local smoothness which are significant for some problems significantly better. Empirically, we provide thorough numerical results to back up our theory. Additionally we present algorithms exploiting local smoothness in more aggressive ways, which perform even better in practice.


Privacy-Preserving Low-Rank Adaptation for Latent Diffusion Models

Luo, Zihao, Xu, Xilie, Liu, Feng, Koh, Yun Sing, Wang, Di, Zhang, Jingfeng

arXiv.org Artificial Intelligence

Low-rank adaptation (LoRA) is an efficient strategy for adapting latent diffusion models (LDMs) on a private dataset to generate specific images by minimizing the adaptation loss. However, the LoRA-adapted LDMs are vulnerable to membership inference (MI) attacks that can judge whether a particular data point belongs to the private dataset, thus leading to the privacy leakage. To defend against MI attacks, we first propose a straightforward solution: Membership-Privacy-preserving LoRA (MP-LoRA). MP-LoRA is formulated as a min-max optimization problem where a proxy attack model is trained by maximizing its MI gain while the LDM is adapted by minimizing the sum of the adaptation loss and the MI gain of the proxy attack model. However, we empirically find that MP-LoRA has the issue of unstable optimization, and theoretically analyze that the potential reason is the unconstrained local smoothness, which impedes the privacy-preserving adaptation. To mitigate this issue, we further propose a Stable Membership-Privacy-preserving LoRA (SMP-LoRA) that adapts the LDM by minimizing the ratio of the adaptation loss to the MI gain. Besides, we theoretically prove that the local smoothness of SMP-LoRA can be constrained by the gradient norm, leading to improved convergence. Our experimental results corroborate that SMP-LoRA can indeed defend against MI attacks and generate high-quality images. Our code is available at https://github.com/WilliamLUO0/StablePrivateLoRA.


Local Smoothness in Variance Reduced Optimization Tong Zhang Dept. of Operations Research & Financial Engineering Dept. of Statistics Princeton University Rutgers University Princeton, NJ08544

Neural Information Processing Systems

We propose a family of non-uniform sampling strategies to provably speed up a class of stochastic optimization algorithms with linear convergence including Stochastic Variance Reduced Gradient (SVRG) and Stochastic Dual Coordinate Ascent (SDCA). For a large family of penalized empirical risk minimization problems, our methods exploit data dependent local smoothness of the loss functions near the optimum, while maintaining convergence guarantees. Our bounds are the first to quantify the advantage gained from local smoothness which are significant for some problems significantly better. Empirically, we provide thorough numerical results to back up our theory. Additionally we present algorithms exploiting local smoothness in more aggressive ways, which perform even better in practice.


Meta-Learning Linear Quadratic Regulators: A Policy Gradient MAML Approach for the Model-free LQR

Toso, Leonardo F., Zhan, Donglin, Anderson, James, Wang, Han

arXiv.org Artificial Intelligence

One of the main successes of Reinforcement Learning (RL) (for example, in the context of robotics) is its ability to learn control policies that rapidly adapt to different agents and environments (Wang et al., 2016; Duan et al., 2016; Rothfuss et al., 2018). This idea of learning a control policy that efficiently adapts to unseen RL tasks is referred to as meta-learning, or learning to learn. The most popular approach is the Model-Agnostic Meta-Learning (MAML) (Finn et al., 2017, 2019). In the context of RL, the role of MAML is to exploit task diversity of RL tasks drawn from a common task distribution to learn a control policy in a multi-task and heterogeneous setting that is only a few policy gradient (PG) steps away from an unseen task optimal policy. Despite its success in image classification and RL, more needs to be understood about the theoretical convergence guarantees of MAML for both model-based and model-free learning.


SceneDM: Scene-level Multi-agent Trajectory Generation with Consistent Diffusion Models

Guo, Zhiming, Gao, Xing, Zhou, Jianlan, Cai, Xinyu, Shi, Botian

arXiv.org Artificial Intelligence

Realistic scene-level multi-agent motion simulations are crucial for developing and evaluating self-driving algorithms. However, most existing works focus on generating trajectories for a certain single agent type, and typically ignore the consistency of generated trajectories. In this paper, we propose a novel framework based on diffusion models, called SceneDM, to generate joint and consistent future motions of all the agents, including vehicles, bicycles, pedestrians, etc., in a scene. To enhance the consistency of the generated trajectories, we resort to a new Transformer-based network to effectively handle agent-agent interactions in the inverse process of motion diffusion. In consideration of the smoothness of agent trajectories, we further design a simple yet effective consistent diffusion approach, to improve the model in exploiting short-term temporal dependencies. Furthermore, a scene-level scoring function is attached to evaluate the safety and road-adherence of the generated agent's motions and help filter out unrealistic simulations. Finally, SceneDM achieves state-of-the-art results on the Waymo Sim Agents Benchmark. Project webpage is available at https://alperen-hub.github.io/SceneDM.


Local transfer learning from one data space to another

Mhaskar, H. N., O'Dowd, Ryan

arXiv.org Artificial Intelligence

A fundamental problem in manifold learning is to approximate a functional relationship in a data chosen randomly from a probability distribution supported on a low dimensional sub-manifold of a high dimensional ambient Euclidean space. The manifold is essentially defined by the data set itself and, typically, designed so that the data is dense on the manifold in some sense. The notion of a data space is an abstraction of a manifold encapsulating the essential properties that allow for function approximation. The problem of transfer learning (meta-learning) is to use the learning of a function on one data set to learn a similar function on a new data set. In terms of function approximation, this means lifting a function on one data space (the base data space) to another (the target data space). This viewpoint enables us to connect some inverse problems in applied mathematics (such as inverse Radon transform) with transfer learning. In this paper we examine the question of such lifting when the data is assumed to be known only on a part of the base data space. We are interested in determining subsets of the target data space on which the lifting can be defined, and how the local smoothness of the function and its lifting are related.


A Segment-Wise Gaussian Process-Based Ground Segmentation With Local Smoothness Estimation

Mehrabi, Pouria, Taghirad, Hamid D.

arXiv.org Artificial Intelligence

Both in terrestrial and extraterrestrial environments, the precise and informative model of the ground and the surface ahead is crucial for navigation and obstacle avoidance. The ground surface is not always flat and it may be sloped, bumpy and rough specially in off-road terrestrial scenes. In bumpy and rough scenes the functional relationship of the surface-related features may vary in different areas of the ground, as the structure of the ground surface may vary suddenly and further the measured point cloud of the ground does not bear smoothness. Thus, the ground-related features must be obtained based on local estimates or even point estimates. To tackle this problem, the segment-wise GP-based ground segmentation method with local smoothness estimation is proposed. This method is an extension to our previous method in which a realistic measurement of the length-scale values were provided for the covariance kernel in each line-segment to give precise estimation of the ground for sloped terrains. In this extension, the value of the length-scale is estimated locally for each data point which makes it much more precise for the rough scenes while being not computationally complex and more robust to under-segmentation, sparsity and under-represent-ability. The segment-wise task is performed to estimate a partial continuous model of the ground for each radial range segment. Simulation results show the effectiveness of the proposed method to give a continuous and precise estimation of the ground surface in rough and bumpy scenes while being fast enough for real-world applications.